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Abstract 

Traditional approaches in the analysis of downlink systems decouple the precoding and channel 
estimation problems. However, in cellular systems with mobile users, these two problems are in fact 
tightly coupled. In this paper, this coupling is explicitly studied by accounting for channel training 
^ ■ overhead and estimation error while determining the overall system throughput. The paper studies the 

problem of utiUzing imperfect channel estimates for efficient linear precoding and scheduling. We present 
a precoding method that takes into account the degree of channel estimation error in conjunction with 
the number of users. Next, we optimize the training period, which is an important operational parameter 
\^ • for these systems. Finally, we present lower and upper bounds of the achievable throughput. In typical 
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scenarios, these bounds are close. 



I. Introduction 



^ ! There is a rich and varied literature in the domain of multiple antenna cellular systems. 

o3 ' Ever since the introduction of multi-antenna systems, almost every combination of antennas 
with physical settings has been modeled and analyzed. The bulk of this literature, however, 
has focused on developing strategies for frequency division duplex (FDD) systems, and not 
without good reason. FDD systems have dominated deployment, while interest in deploying 
time division duplex (TDD) systems has grown only in recent years. Although TDD and FDD 
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seem like interchangeable architectural schemes for cellular systems, there are some fundamental 
differences that need to be isolated and studied in detail. The goal of this paper is to bring the 
understanding of TDD systems closer to that of FDD systems today. 

It is now well established that multiple antennas at the transmitter and receiver in a point- 
to-point communication system can greatly improve the overall throughput of the system [12], 
jSl. In a multi-user setting, this gain requires channel state information (CSI) and precoding 
strategies that use this CSI at the basestation. Given this CSI, the channel capacity problem 
can be formulated in terms of a multi-antenna Gaussian broadcast channel (BC). Over the past 
decade, the capacity of a multi-antenna Gaussian BC has been determined, and shown to be 
achieved by using dirty paper coding (DPC) in [[5]|, lIH, |I71, jHl. Subsequently, the order 
growth in the sum capacity gain with the number of antennas and the signal to noise ratio 
(SNR) have been characterized in [[TOl . An overview of the capacity results in multi-user 
multiple-input multiple-output (MIMO) channels can be found in [|TT|. 

Although dirty paper coding is known to be capacity achieving with perfect CSI, there are 
multiple issues when attempting to apply it directly to a cellular system model. First, if the 
CSI is not perfect, this optimality does not hold and there can be a significant loss in rate 
[fT2l . Moreover, even with perfect CSI, implementing dirty paper coding using lattices or other 
structured schemes is not yet practically viable. Furthermore, we consider a practical scenario in 
which mobiles are simple low-cost devices, and we assume that they cannot cancel interference. 
Given that one of the aims of this paper is to understand channel estimation and sources of 
imperfections in TDD systems, we utilize simple linear precoders that have a greater degree of 
robustness to estimation errors. We acknowledge that these precoders are not optimal compared 
to DPC. However, it is, as we shall see, a good starting point for obtaining better achievable 
rates in future multi-antenna downlink TDD systems. 

Given that we use linear precoding, the goal of this paper is to analyze a multi-antenna 
downlink TDD system with channel training and estimation error factored into the net throughput 
expression. One of the primary differences between TDD and FDD systems is the means by 
which channel training and resulting estimation is conducted. In FDD systems, a common means 
of gaining CSI is feedback from the users to the basestation. In TDD systems, channel reciprocity 
can be used to train on reverse link and obtain an estimate of the channel at the basestation. 
Reciprocity thus eliminates the need for a feedback mechanism (along with forward training) 



May 11, 2010 



DRAFT 



3 



to be developed. In literature, the study of joint precoding and feedback schemes for FDD 
systems have been studied in great detail lfT3l . lfT4ll . ifTSl . lfT6l . [fTTl (see prior work section for 
details). In a similar vein, we find that a joint study of channel estimation and precoding for 
TDD systems is needed to understand the resulting overall system throughput. To provide some 
typical system parameters, consider a carrier frequency of 1900 MHz and (maximum) mobile 
velocity of 150 miles/hour. Then, the coherence time is approximately 400 fis [[TSl . With typical 
coherence bandwidth of 50 — 200 kHz, the effective symbol rates for narrow-band operation is 
approximately 5 — 20 /is. This leads to short coherence time in symbols of 20 — 80 symbols, 
which clearly motivates our joint study of channel training, channel estimation and precoding. 

Our analytical framework considers a downlink system with a large number of base-station 
antennas (along the lines of the framework studied in lfT9l ). In this framework, our focus is not 
on systems specified by current standards such as WiMax and LTE that use only 2 — 4 antennas. 
Instead, our focus is on possible future generations of wireless systems where an antenna array 
with a hundred or more antennas at the base-stations is an attractive approach. Preliminary 
feasibility studies show that for 120 antennas we need a space occupied by a cylinder of one 
meter diameter and one meter high: half- wavelength circumferential spacing of 40 antennas 
in each of three rings, each ring spaced vertically two wavelengths apart. With such systems, 
TDD offers a significant advantage over FDD operation. In FDD systems, the forward training 
overhead needed increases with the number of base- station antennas. This overhead also increases 
the (limited) feedback needed to gain CSI at the basestation which is often neglected when FDD 
systems are analyzed. In contrast to this, in this paper, we account for all channel training 
overhead incurred in the throughput analysis we present. 

The main contributions of this paper are: 

• We determine a method of linear precoding and scheduling that maximize net throughput for 
realistic TDD systems. That is channel estimation and the consequent errors are taken into 
account. The optimal precoding and scheduling are identified in the course of an asymptotic 
analysis, taking the number of base station antennas to infinity. 

• Our results allow us to optimize the training period in such TDD systems. In other words, 
we determine the optimal trade off between estimating the channel and using the channel. 

• We derive lower (achievable) and upper bounds on the system throughput for the suggested 
precoding and scheduling schemes. We demonstrate that in typical scenarios these bounds 
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are close and therefore allow one to accurately estimate the sum rate of the suggested 
schemes. The bounds also show that the proposed schemes give significant improvement 
over other schemes in the literature (in particular the one given in [[T9ll ). 
It is important to emphasize that we do not limit our study to only those systems with a large 
number of base-station antennas. We focus on such systems in the first part of the paper and 
develop simple precoding schemes that take advantage of large number of base-station antennas. 
In the second part of the paper, we study a modified version of the precoder presented in EOl 
that do not assume a large number of base-station antennas. In [|20l . a precoding matrix for 
downlink systems is obtained using an iterative algorithm which attempts to determine one of 
the local maxima of the sum rate maximization problem when CSI is available at the base-station 
and the users. Since, in our setting, the base-station obtains CSI through training and thus may 
not be perfect, we modify this algorithm to account for error in the estimation process. 

A. Prior Work 

As is already well known, DPC [|2T]| can be used as a precoding strategy when the interference 
signal is known noncausally and perfectly at the transmitter. Given that translating DPC to 
practice is by no means a trivial task, various alternative precoding methods with low complexity 
have been studied assuming perfect CSI. Prior work on precoding [[22|. Il23l . [[24|. Il25l . EOl 
demonstrates that sum rates close to sum capacity can be achieved with lower computational 
complexity compared to DPC. There are also opportunistic scheduling schemes ll26l with lower 
complexity compared to DPC which can achieve sum rate that asymptotically scales identically as 
the sum capacity with the number of users. The existing literature on scheduling ll27l . [|28l shows 
the significance of opportunistic scheduling towards maximizing the sum rate in the downlink. 

As briefly mentioned before, in FDD systems, a limited-CSI setting has been studied in great 
detail primarily using a limited-feedback framework |[T3l . [[Ml, |[T5l . [[T6l . ifTTl . In this framework, 
perfect CSI is assumed at the users and limited-feedback to base-station is studied. In [[TSl . the 
authors show that, at high SNR, the feedback rate required per user must grow linearly with the 
SNR (in dB) in order to obtain the full MIMO BC multiplexing gain. The main result in |[T6l 
is that CSI feedback can be significantly reduced by exploiting multi-user diversity. In |[T7l . the 
authors design a joint CSI quantization, beamforming and scheduling algorithm to attain optimal 
throughput scaling. However, all these papers assume perfect channel knowledge at the users 



May 11, 2010 



DRAFT 



5 



and do not study TDD systems. The effect of training in multi-user MIMO systems using TDD 
operation is studied in |fT9l . The authors limit the study to homogeneous users and zero-forcing 
precoding. Our paper is motivated from and builds on this work on TDD systems. 

B. Organization 

The rest of this paper is organized as follows. In Section UIl we describe the system model 
and the assumptions.. We consider two transmission methods. First, we consider a transmission 
method with channel training on reverse link only in Section [nil Next, we consider a transmission 
method which sends forward pilots in addition to reverse pilots in Section |lVl In Section |Vl 
we provide an upper bound on the sum rate for communication schemes using linear precoding 
at the base-station. We compare the performance of the various schemes considered through 
numerical results in Section |VI] and provide our concluding remarks in Section IVIIi 

C. Notation 

We use bold face to denote vectors and matrices. All vectors are column vectors. We use 
(■)-^ to denote the transpose, (■)* to denote the conjugate and (■)^ to denote the Hermitian of 
vectors and matrices. Tr(A) denotes the trace of matrix A and A^^ denotes the inverse of 
matrix A. diag{a} denotes a diagonal matrix with diagonal entries equal to the components of 
a. y denotes element-wise greater than or equal to. E[-] and var{-} stand for expectation and 
variance operations, respectively. 

II. System Model 

The system model consists of a base-station with M antennas and K single antenna users. 
The base-station communicates with the users on both forward and reverse links as shown in 
Figure \T\ The forward channel is characterized by the K x M matrix H and the forward SNRs. 
The system model incorporates frequency selectivity of fading by using orthogonal frequency- 
division multiplexing (OFDM). The duration of the coherence interval (defined later) in symbols 
is chosen for one OFDM sub-band. For simplicity, we consider OFDM sub-bands as parallel 
channels and concentrate on one OFDM sub-band (where channel matrix is fixed and there is 
no multi-path). The details of OFDM (including cyclic prefix) are completely omitted as this is 
by no means the focus of the paper. Further, we make the following assumptions. 
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Fig. 1. Multi-User MIMO TDD System Model 

1) Rayleigh block fading: The channel undergoes Rayleigh fading over blocks of T symbols 
called the coherence interval during which the channel remains constant. In Rayleigh 
fading, the entries of the channel matrix H are independent and identically distributed 
(i.i.d.) zero-mean, circularly-symmetric complex Gaussian CA/'(0, 1) random variables. 

2) Reciprocity: The reverse channel between any user and the base-station (at any instant) is 
a scaled version of the forward channel. 

3) Coherent uplink transmission: Time synchronization is present in the system. 

Let the forward and reverse SNRs associated with A;-th user be p{ and p^, respectively. These 
forward and reverse SNRs account for the average power at the base-station and the users, and 
the propagation factors (including path loss and shadowing). Note that these propagation factors 
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change at a much larger time-scale compared to fading. Hence, in the analysis, these parameters 
are treated as constants. For simplicity of notation, we ignore the time index. On the forward 
link, the signal received by the A;-th user is 

4 = \fpii^W + 4 (1) 

where li[ is the A;-th row of the channel matrix H and is the M xl signal vector. The additive 
noise zl is i.i.d. CAf{0, 1). The average power constraint at the base-station during transmission 
is E[||s''^||^] = 1 so that the total transmit power is fixed irrespective of its number of antennas. 
Note that the received power depends on the channel norm and hence on the number of antennas 
at the base-station. On the reverse link, the vector received at the base-station is 

x*" = H^E^'s" + (2) 

where s'^ is the signal-vector transmitted by the users and 

E'^ = diag{[v^V^... VP^n. 

The components of the additive noise vector z*" are i.i.d. CJ\f{0, 1). The power constraint at the 
k-th user during transmission is given by E[||sj^,p] = 1 where is the A;-th component of s^'. 

Remark 1: We primarily focus on short coherence intervals. The need to study short coherence 
intervals arises from the high mobility of the users. In this setting, it is important that we account 
for channel training overhead and estimation error. Our goal is to account for these factors in the 
net throughput and develop schemes that achieve high net throughput. The performance metric 
of interest is the achievable weighted-sum rate. The motivation behind looking at weighted-sum 
rate is that many algorithms implemented in the network layer and above assign weights to each 
user depending on various factors such as priority. These weights are pre-determined and known. 
For obtaining schemes of practical importance, we look at schemes with low computational 
requirements. As mentioned earlier, we consider linear precoding techniques at the base-station. 

in. Training on Reverse Link Only 

In this section, we consider a transmission scheme that consists of three phases as shown in 
Figure [2] - training, computation and data transmission. In the training phase, the users transmit 
training sequences to the base-station on the reverse link. The base-station performs the required 
computations for precoding in the computation phase. We assume that this causes a one symbol 
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Fig. 2. Different pliases in a coherence interval 

delay in order to emphasize the delay in computation/control. In practice, this delay is a system 
dependent parameter. In the data transmission phase, the base-station transmits data symbols to 
the selected users. 

Remark 2: In this transmission method, the users do not obtain any information regarding the 
instantaneous channel. The base-station obtains an estimate of the instantaneous channel. This 
is very different from the usual setting where the users also have estimates of channel gains. As 
a result, the analysis is very different as well. 

Our goal is to obtain a simple precoding method that can achieve high weighted-sum rate. We 
consider the setting of large number of base-station antennas in this section, and take advantage 
of this setting to derive a simple precoding method. The capacity region of the system described 
in Section In] is not known even in the single user setting. In addition, capacity achieving schemes 
can in general be very complex to implement in practice. Therefore, our approach is to obtain 
variants of well-studied simple algorithms in the perfect CSI setting that is applicable in the 
imperfect CSI setting, and analyze the system performance. In particular, we consider MMSE 
channel estimation, opportunistic scheduling of users based on channel gains, and generalized 
zero-forcing (described later) precoding. The parameters used in the algorithm are optimized for 
improved performance. Next, we provide the details of the algorithm and our analysis. 

A. Channel Estimation 

Channel reciprocity is one of the key advantages of time-division duplex (TDD) systems over 
frequency-division duplex (FDD) systems. We exploit this property to perform channel estimation 
by transmitting training sequences on the reverse link. Every user transmits a sequence of training 
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signals of r*" symbols duration in every coherence interval. The A;-th user transmits the training 
sequence vector -y/r^ V'l- We use orthonormal sequences which implies VlV'j = ^ij where 6ij 
is the Kronecker delta. 

Remark 3: The use of orthogonal sequences restricts the maximum number of users to r^, 
i.e., K < t\ 

The training signal matrix received at the base-station is 

Y = v^H^E"^*^ + 



where * = [V'i'02 ••• '^k] (*^* = I) and the components of V are i.i.d. CA/'(0, 1). The 
base-station obtains the linear minimum mean-square error estimate (LMMSE) of the channel 

1 T' 



H = diag 



*'Y 



(3) 



1 + pW 1 + p'kT 

The estimate H is the conditional mean of H given Y. Therefore, H is the MMSE estimate as 
well. By the properties of conditional mean and joint Gaussian distribution, the estimate H is 
independent of the estimation error H = H — H [|29l . The components of H are independent 
and the elements of its A;-th row are CJ\f ^0, j^^^r-r^ ■ In addition, the components of H are 
independent and the elements of its A;-th row are CM ^0, x+p^-r'- ) • 

B. Generalized Zero-Forcing Precoding 

In order to deal with heterogeneous users, we propose the following generalized zero-forcing 
(ZF) precoding. This is performed in two steps: (z) selection of users, and {ii) precoding op- 
timization for selected users. Let the scheduling algorithm that select the users be denoted by 
5'(H) = {5*1, 52, ... , Sm} ^ {1, 2, ... , K}, i.e., based on the channel estimate H the scheduling 
algorithm selects users 5*1, 5*2, ... , Sn- Next, let pi, . . . ,pk be some positive constants. Let 

Bs = diag 



■ _i _i _i 
Ps~' PS2 ■ ■ ■ PSj' 



5JV 



T 



and H5 be the matrix formed by the rows in set ^(H) of matrix H. Similarly, define Hg and 
H5. 



Let Hds = D^H^. The generalized zero-forcing precoding matrix is defined as 

-1 



H 



t 

DS 



^DS 



(4) 



'Tr 



^dsUds 
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This preceding matrix is normalized so that 

Tr (A^j^s^ns) = 1- 

For this linear precoding method, the transmission signal-vector for the selected users is given 
by 

Sf = A^^q. (5) 

Clearly the base-station transmit power constraint can be satisfied irrespective of the values of 
pi, . . . ,pk by imposing the condition E[||g„p] = l,Vn G {1, . . . ,N}. 

This generalized zero-forcing precoding method requires a scheduling algorithm and a choice 
of the Pi values. These are explained later in this section. Next, we characterize the achievable 
throughput using this precoding method. 



C. Achievable Throughput 

In this section, we obtain an achievable throughput for the system under consideration. Given 
a scheduling algorithm, we denote the probability of selecting the A;-th user as 7^. The throughput 
derived depends on the scheduling strategy through the random variable x (defined later) and the 
probabilities of selecting the users. Recall that M is the number of antennas at the base-station, 
K is the number of users, p{ is the forward SNR associated with the k-th user and pi is the 
reverse SNR associated with the k-th user. Let the weight associated with the k-th user be Wk- 
The base-station performs MMSE channel estimation as described in Section IIII-AI For channel 
estimation, the training period used is > K symbols. 

Theorem 1: Consider the precoding method described above. Then, the following weighted- 
sum rate is achievable during downlink transmission: 



K 



R. = I 1 + ^j^^^^ ^ I , (6) 



(7) 



where x is the scalar random variable given by 



Proof Idea: Since the users do not know the instantaneous channels, the users use the 
expected values of its effective channels. Therefore, the channel variation around the expected 



May 11, 2010 



DRAFT 



11 



value contributes to the effective noise. The imperfect channel knowledge at base-station also 
contributes to effective noise. We show that the effective noise is uncorrelated with the signal, 
and therefore worst case Gaussian noise of same variance can be used to obtain this achievable 



Note that the values E[x] and var{x} can be determined via a one time calculation with high 
precision. Next, we perform precoding optimization and user selection. 

D. Optimization of Precoding Matrix 

We introduced the parameters pi, . . . ,pk in the generalized zero-forcing precoding to handle 
the heterogeneity of users, i.e., differences in the weights, the forward SNRs and the reverse 
SNRs associated with users. In this section, our goal is to obtain these parameters as a function 
of the weights, the forward SNRs and the reverse SNRs. We make the following simplifications 
to achieve our goal. 

1) The performance metric of interest is the achievable weighted-sum rate -Rg in However, 
i?s is a function of the scheduling algorithm. We consider the case of selecting all users 
to obtain pi, . . . ,Pk- 

2) We would like to choose non-negative values for pi, . . . , px such that i?s in Q is 
maximized. However, this is a hard problem to analyze as closed-form expression for 
the expectation and the variance terms in ^ is unknown. We consider the asymptotic 
regime M/ K ^ 1 as this is appropriate in this section. 

Remark 4: Apart from making the problem mathematically tractable, the asymptotic regime 
M/ K ^ 1 is of interest due to the following reasons: i) the system constraints K < r' , t^' < T 
place an upper bound on K, independent of the number of antennas, and ii) the base- station can 
be equipped with many antennas each powered by its own low-power tower-top amplifier [19]. 

From the weak law of large numbers, it is known that \im.M/K^oo IJ^Z^ = Ik where Z is the 
K X M random matrix whose elements are i.i.d. CJ\f{0, 1). Therefore, ZZ^ can be approximated 
by Mix. Hence, the random variable % in (|7]) can be approximated as 



weighted-sum rate. The detailed proof is given in the Appendix. 



M 



X 



K 



(8) 
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where aj = ^^^r^r j • Substituting dS]) in we get 



i?2 ^ J(p) = ^ Wi \o, 



( \ 

OiPi 



2=1 



where 6, = ■? . Under this approximation, we can find the optimal values for 

pi, . . . ,Pk that maximize J(p) as described below. 

Theorem 2: An optimal solution p* of the objective function maxp J(p) is of the form cp* 
where c is any positive real number and p* = [PiP2 ■ ■ ■ PkV given by 

p* = maxlo,(^-l]\. (9) 



The positive real number i^* is unique and given by 

K 

aip* = 1. 

1=1 

Proof: The proof idea is to introduce an additional constraint to obtain a convex optimization 
problem. We show that the introduction of the additional constraint does not affect the optimal 
value of the optimization problem. 

Note that Wi > 0, h > and aj > 0. Let a = [ai 02 ... ax]^. We consider the optimization 
problem 

maximize J(p) (10) 
subject to p ^ 0. 

Since J(p) = J(cp) for any c > and p* ^ 0, p* such that a^p* = c is an optimal solution to 
(flOl) if and only if p* = (l/c)p* is an optimal solution to the convex optimization problem 

K 

minimize — ^ Wi log (1 + biPi) (11) 
1=1 

subject to p ^ 0, a^p = 1. 

In order to solve (fTT)) . we introduce Lagrange multipliers A G for the inequality constraints 
p ^ and G M for the equality constraint a^p = 1. The necessary and sufficient conditions 
for optimality are given by Karush-Kuhn-Tucker (KKT) conditions ll30l . These conditions are 

P* h 0, a^p* = 1, A* ^ 0, 
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m = 0, -T^^-K + i^*a, = 0, 1 = 1,. ..,K. 

1 + hiPi 

This set of equations can be simplified to 

ga.ma.{o.(-^-i)} = l, (12) 

Since the left-hand of (fT2l) is an increasing function in ^, this equation has a unique solution, 
which can be easily computed numerically using binary search. This completes the proof. ■ 
The optimized p* given by ^ is substituted in Q to obtain the optimized precoding matrix. 
We use this optimized precoding matrix even when number of users K is comparable to number 
of base-station antennas M. We denote the scheme where we use optimized pi values for 
precoding by Scheme- 1 and the scheme where we use pi = \ for precoding by Scheme-0. 
In both the schemes, we consider the trivial scheduling of selecting all users. Note that the 
weights Wi are assumed constant only over a coherence interval, i.e., for a period of T OFDM 
symbols. The weights may change from one coherence interval to the next in accordance with 
changing network requirements (for example the weights may be selected according to users 
downlink queue lengths). 

E. Scheduling Strategy 

The scheduling strategy proposed is opportunistic scheduling of users based on scaled esti- 
mated channel gains of users (details given later). We ignore the spatial separability/orthogonality 
of channels due to the following reason. As mentioned earlier, the transmission method in this 
section is of interest in the large number of base-station antennas setting. In this setting, the spatial 
separability/orthogonality of channel play a less important role. Also, the channel estimate at 
the base-station is expected to be poor. The prediction of channel orthogonality based on this 
poor estimate is generally inaccurate. In addition, brute-force search over subsets of users is 
computationally complex. In the second part of this paper, for the general setting, we consider 
schemes that use spatial separability/orthogonality of channels. 
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1 ) Homogeneous Users: First, we consider the special case where the users are statistically 
identical. In this homogeneous setting, the forward SNRs from the base-station to all the users 
are equal (given by p^) and reverse SNRs from all the users to the base-station are equal (given 
by p''). Furthermore, the weights assigned to all the users are unity, i.e., Wk = 1. The need for 
explicit scheduling arises due to the ZF based precoding used. With perfect channel knowledge 
at the base-station (H = H) and no scheduling {N = K), the ZF precoding diagonalizes the 
effective forward channel and all users see same effective channel gains. 

We use the following simple heuristic rule at the base-station. In every coherence interval, 
the base-station selects those users with largest estimated channel gains. This rule is moti- 
vated by the expectation term E \x\ appearing in the achievable weighted- sum rate in Let 
h^-), h^^, . . . , h^-) be the norm-ordered rows of the estimated channel matrix H. Then, the 
matrix H5 is given by H5 = [h(i) h(2) . . . ^{n)Y ^^'^ achievable sum rate in ^ with 
maximization over N becomes 



where U is the N x M matrix formed by the N rows with largest norms of a K x M random 
matrix Z whose elements are i.i.d. CJ\f{0, 1). We use the value Nopt for N that maximize the 
objective function in (fT3l) . which is a function of the system parameters that can be computed 
numerically. 

Net achievable sum rate accounts for the reduction in achievable sum rate due to training. In 
every coherence interval of T symbols, first r*" symbols are used for training on reverse link, one 
symbol is used for computation and the remaining (T — r' — 1) symbols are used for transmitting 
information symbols as shown in Figure [2l The training length r'' can be chosen such that net 
throughput of the system is maximized. Thus, the net achievable sum rate is defined as 




(13) 



Here, the random variable 




1 



R^ 



(14) 



net — max 



T 



subject to r"" < T - 1 and r"" > K. 
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2) Heterogeneous Users: In this section, we propose the following heuristic scheduling strat- 
egy for heterogeneous users. 

Let zf , , . . . , z]^ be the rows of the matrix 



diag 





-1 T' 



H 



where H is the estimated channel given by ([3]). Note that Z is normalized such that the entries 
are independent and identically distributed. In every coherence interval, the users are ordered 
such that p^^^ II z^-) |p > 'p^^-^ ||z^-) |P > • • • > PIk) W'^Jk) IP ^^'^ ^^e first N users under this ordering 
are selected. The value Nopt is used for N that maximize the net achievable weighted-sum rate 
defined below. The intuition behind this strategy is that p^^^ is nearly proportional to the average 
power assigned to the A;-th user and ||z^^|p captures the instantaneous variation in power. 
Similar to the homogeneous case, we define the net achievable weighted-sum rate as 

T - r'' - 1 

i?net = max i?E (15) 

r'-.Af T 

subject to the constraints N < K, t''' > K and t''' < T — 1. is given by We denote 
the scheme where we use this scheduling strategy along with optimized pi values for precoding 
by Scheme-2. We provide numerical results showing the improvement obtained by using this 
strategy in Section IVTl 

F. Optimal Training Length 

We consider the problem of finding the optimal training length in the homogeneous setting 
when the scheduling strategy proposed in Section IIII-EI is used. The objective is to maximize 
the net achievable sum rate given by (fT5l) . For given values of M,K,T,p-l^ and p*", it seems 
intractable to obtain a closed-form expression for the optimal training length. Therefore, we 
look at the limiting cases p'' — and p'' — )• oo to understand the behavior of the optimal 
training length with reverse SNR. 

In the limit p'' — )• 0, we can approximate the net rate as 



We use the fact that log(l + x) !^ x as x — )• to obtain the approximation 

T - r"' - 1 

i?net ~ di (16) 
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where di is a positive constant. It is clear that (fT6l) is maximized when r'' = ^^y^ if we assume 
T > 2/i and T is odd. In the limit p*^ — )■ oo, we can approximate the net rate as 

T-t'' -1 

Rnet ~ d2 

where d2 is a positive constant. This expression is maximized by the minimum possible training 
length which is = K. 

The approximations suggests that nearly half the coherence time should be spent for training 
when the reverse SNR is very low and the minimum possible number of symbols (which is K) 
should be spent for training when reverse SNR is very high. Note that this conclusion is similar 
to the result in (SB for MIMO. 

In summary, we proposed a new precoding method referred to as generalized zero-forcing 
precoding. It consists of a scheduling component and an optimization component. The scheduling 
component is performed using opportunistic scheduling heuristics. The optimization component 
is performed using a convex optimization problem resulting from a relevant asymptotics of large 
number of base-station antennas. The resulting precoding is simple and therefore has significant 
practical value. We demonstrate the improvement obtained in net throughput through numerical 
examples in Section |VIl 

IV. Training on Reverse and Forward Links 

In this section, we consider a transmission method which sends forward pilots in addition to 
reverse pilots in Section ITvIf . In this section, we do not limit our approach to large number of 
base- station antennas. 

In the transmission method considered in the previous section, the users do not obtain any 
knowledge about the instantaneous channel. Every user can be provided with partial knowledge 
about its effective channel gain in one of the following two ways. 1.) The base-station can send 
quantized information of the effective channel gains to the users. 2) The base-station can send 
forward pilots to the users so that the users can estimate the effective gains. It is hard to account 
for the overhead when base-station send quantized information about the effective channel gains. 
In addition, pilot based channel training is conventional in wireless systems. Therefore, we focus 

'There has been some parallel work in 1321. The authors consider two-way training 1331 and study two variants of linear 
MMSE precoders as alternatives to linear zero-forcing precoder used in 1191 . 
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Fig. 3. Reverse and Forward Pilots 



on sending pilots in the forward link. This leads to a transmission method consisting of four 
phases - reverse pilots, computation phase, forward pilots and data transmission - as shown in 
Figure |3l In this scheme, the users can obtain effective channel gain estimates at the expense of 
increased training overhead. 

A. Channel Estimation and Precoding 

As explained in Section IIII-A[ the users transmit orthogonal training sequences on the reverse 
link. From these training sequences, the base-station obtains the MMSE estimate of the channel. 
The base-station uses this channel estimate H to form a precoding matrix to perform linear 
precoding. Let A denote any precoding matrix which is a function of the channel estimate, i.e., 
A = /(H). The precoding function /(■) usually depends on the system parameters such as 
forward SNRs, reverse SNRs and weights assigned to the users. We require that the precoding 
matrix is normalized so that Tr (A^A) = 1. The transmission signal- vector is given by Sj = Aq 
where q = [q'l q'2 • • • QkY the vector of information symbols for the users. The net achievable 
rate derived later in this section is valid for any precoding function. Next, we describe a particular 
precoding method. 

In EOl . the following approach was suggested for finding a good precoding matrix A. Let hj 
be the i-ih. row of the channel matrix H and let be the j-th column of precoding matrix A. 
The sum rate of the broadcast channel can be written in the form 

M / Ih |2 \ 
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Let 



6, 



Ihj-ajp and cj = (T^Tr(AA"^) + ^ Ih^a, 

Further, let A and D be diagonal matrices defined as 

(HA)n (HA)22 



diag 



Cl 



C2 



(ha; 

Cm 



MM 



and 



D = diag 



bi 



JM 



_Ci(6i + Cl) 02(62 + C2) CMipM + CM) 

In EOl . it is shown that the equations = imply 

A = {{a^Tr{'D))lM + HTDH)-^H^A. 



(17) 



(18) 



(19) 



This equation allows one to use the following iterative algorithm for determining an efficient A: 

1) Assigning some initial values to matrices A and D, for instance A = Im, D = /m 

2) Repeat steps 3 and 4 several times 

3) Compute A according to (fT9l) : 

4) Compute A and D according to ^7} and (fTSi) . 

This approach can be extended for the scenario when only an estimate H of the channel 
matrix H and the statistics of the estimation error H is available. In this case, we would like to 
maximize the value of the average sum rate defined by 

i?(H, A) = EH[i?(H + H, A)]. 

Since the statistics of H is assumed to be known, we can generate L samples H'^*^ i = 1, . . . , L, 
according to the statistics. Define H^*) = H + H*^*). Then the average rate can be approximated 
as 



L M I 

i?(H,A)^-5^5^1og 1 



|hfa,| 



2Tr(AAt) + E^^Jhfa, 



We define A^^^ and D^^) as in (flTI) and ( fTSl ) using the matrix H(z) instead of H. Using arguments 
similar to those used in Il20ll . we obtain that the equations = imply 

L 

H« A(^) - H»tD»H(^) - a^Tr (D«) A = 0. (20) 



i=l 
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Let 



L L 



V = ^H(*)tD(*)H« + a2Tr(D«)/M and T = ^H^^^A^*). 




From (l20l) . we have that 



A = V^^T. 



(21) 



This allows us to use the following iterative algorithm for determining A: 

1) Assigning some initial values to matrices A*-*-* and D^*^ for instance A*^*^ = 7^/, D*^') = Im 

2) Repeat steps 3 and 4 several times 

3) Compute A according to (|2T|) : 

4) Compute A^*^ and D^*) according to ([HI) and using H^'^ instead of H. 

Remark 5: The precoding matrix is obtained using numerical techniques. It should be noted 
that the precoding matrices can be computed offline and implemented using look up tables. 
We do not provide the details of this in the paper. Since the precoding is linear, the online 
computational complexity is low. 

B. Forward Training 

The base-station transmits forward pilots so that every user can obtain estimate of its 
effective channel gain. Since we are interested in short coherence intervals, we consider the case 
with very few forward pilots. Note that can be less than the number of users K. For this reason, 
we do not restrict to orthogonal pilots in forward training. The forward pilots are obtained by 
pre-multiplying the vectors qp^\ . . . , q^p'^ with the precoding matrix. In the case of one forward 
pilot {t^ = 1), we consider the forward pilots obtained from the vector q^^-* = [11.. .]^. In the 
case of T-f = 2, we consider the forward pilots obtained from the vectors q^^^ = V^[l 010...]^ 
and qp^-* = \/2[0 101.. .]^. It is straightforward to extend this to any number of forward pilots. 
We denote the vector of forward pilots received by the A;-th user by x^. 

C. Achievable Throughput 

We use similar techniques (proof is more involved) as in the previous section to obtain net 
achievable throughput for the transmission method with reverse and forward pilots. From ([I]), 
the signal-vector received at the users is 



= E^HAq + z^ 



(22) 
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'. We denote the effective forward channel in (|22 



' f f 



where E-'^ = diag 

by G = E-^HA with (i, j)-th entry gij 

Theorem 3: For the transmission method considered, a lower bound on the downlink weighted 
sum capacity during transmission is given by 



K 



Ry 



k=i 



logs 



V 







kk 




2 


1 + Z^[\9ki 


2 


x^] + var{5(fcfc 





ij^k 



(23) 



Proof: The users use the conditional exceptions of the effective channels given the received 
pilots. The proof is given in the Appendix. ■ 
We define net achievable weighted- sum rate as 



T - t'' 



-Ry 



i?net = max 

which is consistent with the earlier definition. 

In summary, we proposed a technique that uses the channel estimate to obtain a precoding 
matrix that is "good" in expectation for many channel realizations around this estimate. We 
demonstrate the performance improvement through numerical examples in Section |VIl 



V. Upper Bound on Sum Rate 

As in the previous sections, we assume that an estimate H, the statistics of H, H, and H, 
and forward SNRs p{ are available at the base-station. Using this information, the base-station 
computes a precoding matrix A. The signal received by users is 

X = E^HAq+ z. 

As before, we denote the forward pilots received by the A;-th user using x^, . Let 

Cj = maxl{xj;qj\^l), 

piij) 

where p{qj) is the pdf of qj. The sum capacity is defined by 

C = Ci + ... + Ck. 

In Sections [nil |Wl lower bounds for different communication scenarios were derived on C. The 
following simple theorem defines an upper bound on C. 
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Theorem 4: 

^ / J\-UT^ 12 



Proo/; Let G = HA. Then, 



(24) 



Ci = maxJ(xj;gj|x^) 

< maxJ(xjG;gj|x^) = max{J(xj; g^lG, x^,) + J(G;gj|x^)} 



max/(xj; gj|G) = log2 1 



pfe) \ 1 + ^,_,^.p^|hjat|2^ 

Here, we used the facts that G and qj are independent and therefore /(G;gj|x^) = 0, and that 
x^ is a noisy version of G and therefore I{xj] gj|G,x^.) = I{xj] qj\G). 

■ 

It is easy to see that the same bound is valid if no forward pilots are available to users. 
In general this upper bound is valid for any particular scheme of generating precoding matrix 
A. Hence, the bound can be used in all communications scenarios considered in the previous 
sections. In this way, we can obtain an upper bound on the sum rate of any specific commu- 
nication scenario and any specific precoding method. In the numerical results presented in the 
next section, we demonstrate that the gap between our achievable rates derived in the previous 
sections, and the corresponding upper bound is quite narrow. 

Instead of using a specific precoding method in Theorem IH we can try to use a precoding 
matrix A that maximizes (|24|) . under assumption that only H, the statistics of H, H, and H, 
and forward SNRs p{ are available at the base-station. This would give us an upper bound that 
is not dependent on a specific precoding method. In the case that such an upper bound is close 
to a lower bound of some specific precoding method, we could claim that we have not only 
closely identified the sum rate of that specific precoding method, but also that the scheme itself 
is close to optimal linear precoding. 

The problem of finding a precoding matrix A that provably maximizes (|24l) . especially in 
the case when the true channel matrix H is not available, looks to be very hard. We suggest 
the following approximate approach. The algorithm described in Section IIV-AI allows us to find, 
approximately, A that provides a local maximum for 

EH[i?(H + H, A)]. 
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Running the algorithm several times, with distinct random matrices for A and D in step 1, 
we can find several, say a hundred, local maxima of Ejj[_R(H + H, A)]. Let C-UB-Opt be 
the maximum of these local maxima. Though, strictly speaking, C-UB-Opt is not the global 
maximum of Ejj[_R(H + H, A)], it is likely that there is no linear precoding method that would 
significantly outperform C-UB-Opt. In the next section, we will use C-UB-Opt as a scheme 
independent upper bound for some communication scenarios. 

VI. Numerical Results 

Scheme-UB refers to the upper bound obtained by assuming perfect knowledge of the effective 
channel matrix at the users. Note that this is a scheme dependent upper bound. We have conducted 
extensive simulations for various system parameters, and the observations provided are based 
on these simulations. However, we provide only few representative numerical results here due 
to lack of space. 

A. Training on Reverse Link Only 

We consider this transmission method in the communication regime when SNRs are low. 
Scheme-0 denotes ZF precoding method and Scheme- 1 denotes the generalized ZF precoding 
method with optimized pi values but no user selection. Scheme-2 denotes the method where 
user selection is used along with Scheme- 1. Scheme- 1 and Scheme-2 are techniques developed 
in this paper. Scheme-0 refers to the scheme in [[T9l . 

1) Homogeneous Users: For homogenous users. Scheme- 1 is identical to Scheme-0. First, we 
keep the training sequence length equal to the number of users, i.e., r"" = K. This setting clearly 
is the minimum channel training overhead. In Figure HI we plot sum rate versus the number of 
users if = {1, 2, . . . , M} for M = 16 when forward SNR p-^ = dB and reverse SNR = -10 
dB. In addition to Scheme-0 and Scheme-2 sum rates, we plot upper bound obtained according 
to Theorem |4l Scheme-2 performance when CSI is available at the base-station, and the DPC 
upper bound. The reduction in sum rate due to lack of full CSI at base-station is significant. As 
expected, the performance of DPC is significantly better compared to linear precoder especially 
when M = K. Now onwards, we do not compare with DPC as our focus is on linear precoders 
with channel imperfections. Since the gap between the Scheme-2 sum rate and Scheme-2 upper 
bound is relatively small, the restriction to training on reverse link only is not significant for 
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Fig. 4. Sum capacity lower bound for forward SNR of dB and reverse SNR of —10 dB 



the SNRs considered here. We observe that the proposed scheduling strategy used in Scheme-2 
gives significant improvement over existing Scheme-0. In Figured we plot the optimum number 
of users selected by Scheme-2 Nopt versus the number of users present K for different SNRs 
(mentioned in the plot) and M = 16. 

Next, in Figure [6l we plot net achievable sum rate given by (fT4l) versus the number of antennas 
at the base-station M for coherence intervals T = {10,20,30} symbols, forward SNR = 
dB and reverse SNR p** = —10 dB. For T = 30 symbols, we plot Scheme-2 upper bound 
obtained according to Theorem HI The gap between the lower and upper bound is relatively 
small, therefore the lack of CSI at the users is not very significant for these SNRs. We observe 
that the net achievable sum rate increases with M for both the schemes. As expected from 
the numerical results above, the proposed scheduling scheme (Scheme-2) outperforms existing 
Scheme-0. We notice that the net achievable sum rate varies significantly with the coherence 
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Fig. 5. Optimum number of users versus total number of users 



interval. This shows the importance to account for training overhead when studying wireless 
systems with short coherence intervals (as we have done in this paper). 

2) Heterogeneous Users: We consider coherence interval T = 30 symbols and 12 users with 
forward SNRs {0, 0, 0, 5, 5, 5, 5, 5, 5, 10, 10, 10} dB. The reverse SNR associated with every user 
is considered to be 10 dB lower than its forward SNR. All users are assigned unit weights. 
We plot the net achievable sum rate versus M for this system in Figure |7l The improvement 
obtained using modified ZF precoding with optimized pi values is significant. We remark that the 
performance gain due to scheduling is very significant when the number of users are comparable 
to the number of base-station antennas. 

3) Optimal Training Length: We consider a homogeneous system with M = 32 antennas at 
the base-station, = 8 users and coherence interval of T = 30 symbols. For Scheme-2, we 
obtain the optimal training length and the net sum rate for different values of forward SNR 



May 11, 2010 



DRAFT 



25 



3.5 




2 4 6 8 10 12 14 16 

Number of base-station antennas (M) 



Fig. 6. Net achievable sum rate versus number of base-station antennas 



through brute-force optimization. For every forward SNR considered, we take the reverse SNR 
to be 10 dB lower than the corresponding forward SNR. We plot the optimal training lengths 
in Figure [8] and net sum rates in Figure |9l The behavior of optimal training length with reverse 
SNR is as predicted in Section IIII-FI - T/2 in low SNR regime and K in high SNR regime. 
In Figure [9l we denote ZF with scheduling by ZF-Sch and the corresponding upper bound by 
ZF-Sch-UB. 

B. Training on Reverse and Forward Links 

We consider this transmission method for moderate to high SNRs. We use FP(n) to denote a 
precoding method using n number of forward pilots. Note that FP(0) denotes training on reverse 
link only. We denote results obtained with zero-forcing by ZF, zero-forcing with scheduling by 
ZF-Sch, the approach in EOl by SVH and the modified algorithm given in Section IIV-AI by 
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Fig. 7. Net achievable weighted-sum rate for a system with 12 users 



Mod-SVH. We compare the performance of different methods using numerical examples. For 
the algorithm Mod-SVH, we use the value L = 50 in the simulations. We consider a system 
with K = 8 users, M = 8 antennas at the base- station, reverse training length of r' = 8 and 
coherence interval of T = 30 symbols. We consider the following example. We keep the value 
of reverse SNR 10 dB lower than the forward SNR. For the different methods considered, we 
obtain the achievable sum rate for forward SNRs ranging from 5 dB to 30 dB. These sum rates 
are given in Table IVTBl We plot the methods ZF-Sch-FP(O) and Mod-SVG-FP(l) in Figure 
[TOl We observe significant improvement in net rate by utilizing forward pilots at high forward 
SNRs. In addition, it is interesting to note that we perform reasonably close to the upper bound 
by using one or two forward pilots. 
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Fig. 8. Optimal training length versus forward SNR 

VII. Conclusion 

We develop a general framework to study downlink TDD systems that account for channel 
training overhead and channel estimation error. In contrast to the limited-feedback framework 
for FDD systems, we account for all channel training overhead in the overall system throughput. 
In the first part of the paper, we focus on downlink systems with large number of antennas 
at the base-station. We clearly demonstrate the advantage of TDD operation in this setting. 
In particular, with increasing number of base-station antennas, the TDD operation helps in 
improving the effective forward channel without affecting the training sequence length required. 
We present a generalized zero-forcing precoding method in this setting. We use a combination 
of convex optimization based technique and opportunistic user selection to maximize the overall 
system throughput. In the second part of the paper, we consider the general setting, i.e., we do 
not limit focus to downlink systems with large number of base-station antennas. We present a 
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Fig. 9. Net sum rate versus forward SNR 



linear preceding method than results from an approach to find a local maxima for a non-convex 
optimization problem that is related to the system throughput. Through simulations, we show 
that these precoding schemes provide significant improvement over other schemes in literature. 
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Appendix 



A. Proof of Theorem [7] 



From O, the signal-vector received at the selected users is 



(25) 



where = diag 




. The effective forward channel in (|25] ) is 



G 




DS- 



(26) 
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Suppose that the k-th user is among the selected users. The signal received by the k-th user 



IS 



X 



f 



T f 



where is the row corresponding to k-th user in matrix G. From (|26l) . we obtain 



(27) 



(28) 



where is the /c-th row of H and is the x 1 column-vector with fc-th element equal to 
one and all other elements equal to zero. Substituting (l28l) in (l27l) and adding and subtracting 
mean from we obtain 

' p{pk E [x] gfc + V pivk (x - E [x]) gfc + V pI A^^q + 4 



(29) 



p{Pfc E [x] Qk + 



where the effective noise zl = \J p{pk (x — E [x]) + y Pfch^Az)sq+z{. Note that the expected 
value of any term on the right-hand side of (|29l ) is zero. The noise term is independent of 
all other terms and 



E 



4lq 



0, E 



0, E 



hr|q,H 



0. 



Using the law of iterated expectations, we have 



E 
E 



E 



E 



(x-E[x])gfcqtAl,5h* 



E 
E 



(E [x]-E[x]) = 0, 

g^qtAj^^E [h:.|q,H]] =0, 

(x-E[x])gfeq^Al,5E [K,|q,H] 

Hence, any two terms on the right-hand side of (|29l ) are uncorrelated. The effective noise zl is 
thus uncorrelated with the signal g^. The effective noise has zero mean and variance 



0. 



var < z 



l + p{E h^A^sE 



qq 



^|H,H 



A ' h* 



+ piPkvar {x} 



+ pk^^ai {x} . 



Remark 6: The effective noise zl is uncorrelated with the signal g^, and in general not 
independent. Note that we do not need independence for the proof. 
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In order to obtain a lower bound, we consider (T — r'' — 1) parallel channels where noise 
is independent over time as fading is independent over blocks. Using the fact that worst-case 
uncorrelated noise distribution is independent Gaussian noise with same variance, we obtain the 
lower bound on weighted-sum rate given in This completes the proof. 

B. Proof of Theorem \3\ 

In every coherence interval, the A;-th user receives the vector x^. In the data transmission 
phase, it receives 



where the effective noise zl = {gk^ — E, [9kk\^k])lk+^ QkiQi + zi- Note that the joint distribution 
of and G is known to all users. In (l30l) . the noise term is uncorrelated with the signal 
qk- Note that these terms are not independent, and we do not need independence in the proof. 
Following the steps used in the proof of Theorem [H we obtain the lower bound in (|23l) . 
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E [s-fc/clx^] Qk + zl 



(30) 



